Das Abschlussprojekt besteht aus zwei Teilen:
1. Teil - Datenanalyse
2. Teil - Dashboard
Du arbeitest als Data Scientist bei der P-2-P-Plattform https://www.kiva.org/, die vor einem Jahr gegründet wurde. Nun wollt ihr euer Geschäft erweitern. Euer Team hat sich aufgeteilt und jeder Analyst hat einen Teilbereich der Daten. Deine Aufgabe ist es in einer explorativen Datenanalyse Insights für eure Plattform herauszufinden.
Anforderungen:
- vollständige Datenanalyse
- Dokumentation der Arbeitsschritte
- nachvollziehbare Erläuterungen über Vorgehensweise/Entscheidungen
- 3 verschiedenartige Plots
- Customizing der Plots
- Analyse der Plots
- Coding
Die Dokumentation kann in englisch oder deutsch erfolgen.
Treff ist Donnerstag um 14:30 Uhr im Hauptraum. Da werden wir die Unterlagen in dokumentensichere Formate umwandeln. Das machen wir zusammen. Danach habt ihr bis 15:35 Uhr zeit, eure zu bewertenden Unterlagen in den Projektabgabeordner hochzuladen.
Am Freitag erfolgt die Vorstellung der Plots innerhalb des Zeitrahmens von 5min - (+-1min) ist dabei ok.
Euer Geschäftsmodell ist das Betreiben einer Plattform (crowd-investing) bei der sich Personen die eine Geschäftsidee haben, aber nicht das benötigte Geld, anmelden und für ihr Projekt innerhalb einer vorgegebenen Zeit Geld sammeln können.
Auf der anderen Seite habt ihr Geldgeber, die gern ihr Geld in Projekte anlegen möchten und nach Investitionen suchen.
Als Vermittler bringt eure Plattform also Geldnehmer und Geldgeber zusammen. Ihr verdient euer Geld mit einer Provision für jedes Projekt was auf eurer Plattform landet. Der Geldgeber erhält einen Zins für die Geldleihe.
Deine Datenbasis ist die Historie eurer Plattform.
Alle Projekte sind abgeschlossene Projekte, d.h. die Zeit, um für sein Projekt Geld zu sammeln ist abgelaufen. Euer Geschäftsmodell sieht es vor, dass die gesammelten Gelder ausgezahlt werden, auch wenn der Zielbetrag nicht erreicht wurde.
- funded_amount ... mit Ablauf der "Crowding"zeit erhaltener Betrag/ ausgezahlter Betrag in USD
- loan_amount ... Zielbetrag (Betrag dem man für das Projekt erreichen wollte) in USD
- activity ... Unterkategory zu dem das Ziel des Crowdprojektes thematisch gehört
- sector ... Oberkategory in den das Crowdprojektes Thema fällt
- use ... Kurzbeschreibung wofür das Geld verwendet werden soll
- country_code ... Ländercode nach ISO Norm
- country ... Ländername nach ISO Norm
- region ... Region
- currency ... Währung in den der funded_amount dann ausgezahlt wurde
- term in months ... Dauer über die der Kredit ausgezahlt werden soll
- lender_count ...Darlehensgeber (also wieviele Personen Geld für das Projekt gegeben haben)
- borrower_genders ... Geschlecht und Anzahl der Darlehensnehmer, also diejenigen die das Crowdprojekt initiiert haben
- repayment interval ... vertraglich vereinbarte Rückzahlungsmodalitäten/-rhythmus
In this section i will go ahead to prepare my data for Data analysis. This section will be furher divided into the following subsections .
# Importing the necessary libraries
import pandas as pd
import numpy as np
#Reading my file
# I got a tokenizing error. I opened my file in excel to see what could be the problem.
#I discovered it had a # seperator
df = pd.read_csv("data_abschlussprojekt.csv", sep="#")
df
| Unnamed: 0 | funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 300.0 | 300.0 | Fruits & Vegetables | Food | To buy seasonal, fresh fruits to sell. | PK | Pakistan | Lahore | PKR | 12.0 | 12 | female | irregular |
| 1 | 1 | 575.0 | 575.0 | Rickshaw | Transportation | to repair and maintain the auto rickshaw used ... | PK | Pakistan | Lahore | PKR | 11.0 | 14 | female, female | irregular |
| 2 | 2 | 150.0 | 150.0 | Transportation | Transportation | To repair their old cycle-van and buy another ... | IN | India | Maynaguri | INR | 43.0 | 6 | female | bullet |
| 3 | 3 | 200.0 | 200.0 | Embroidery | Arts | to purchase an embroidery machine and a variet... | PK | Pakistan | Lahore | PKR | 11.0 | 8 | female | irregular |
| 4 | 4 | 400.0 | 400.0 | Milk Sales | Food | to purchase one buffalo. | PK | Pakistan | Abdul Hakeem | PKR | 14.0 | 16 | female | monthly |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 671200 | 671200 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'para compara: cemento, arenya y ladri... | PY | Paraguay | Concepción | USD | 13.0 | 0 | female | monthly |
| 671201 | 671201 | 25.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | NaN | KES | 13.0 | 1 | female | monthly |
| 671202 | 671202 | 0.0 | 25.0 | Games | Entertainment | NaN | KE | Kenya | NaN | KES | 13.0 | 0 | NaN | monthly |
| 671203 | 671203 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | NaN | KES | 13.0 | 0 | female | monthly |
| 671204 | 671204 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | NaN | KES | 13.0 | 0 | female | monthly |
671205 rows × 14 columns
Viewing my columns
# Finding out my column names
df.columns
Index(['Unnamed: 0', ' funded_amount', ' loan_amount', ' activity', ' sector',
' use', ' country_code', ' country', ' region', ' currency',
' term_in_months', ' lender_count', ' borrower_genders',
' repayment_interval'],
dtype='object')
# I discover that my column names have trailing spaces at the end.
#I adjust for easy use
df.columns = df.columns.str.replace(' ', '')
df.columns
Index(['Unnamed:0', 'funded_amount', 'loan_amount', 'activity', 'sector',
'use', 'country_code', 'country', 'region', 'currency',
'term_in_months', 'lender_count', 'borrower_genders',
'repayment_interval'],
dtype='object')
# Let's drop the Unamed column
df = df.drop('Unnamed:0',axis=1)
Duplicate Rows
#Check if there are any duplicate rows
# I will find duplictaes through out the whole table
df.duplicated().sum()
24372
df.loc[df.duplicated()==True,:]
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 498 | 100.0 | 100.0 | Home Energy | Personal Use | to buy a solar-powered lamp. | SV | El Salvador | NaN | USD | 14.0 | 4 | male | monthly |
| 606 | 100.0 | 100.0 | Home Energy | Personal Use | to buy a solar-powered lamp. | SV | El Salvador | NaN | USD | 14.0 | 4 | male | monthly |
| 808 | 450.0 | 450.0 | Higher education costs | Education | to pay for one semester's registration fees. | CO | Colombia | Bogotà | COP | 7.0 | 15 | female | monthly |
| 1703 | 500.0 | 500.0 | Higher education costs | Education | To buy a laptop for educational purposes. | SO | Somalia | Hargeisa | USD | 8.0 | 19 | male | monthly |
| 2317 | 250.0 | 250.0 | Poultry | Agriculture | to purchase poultry. | KE | Kenya | Ndaragwa | KES | 16.0 | 10 | female | monthly |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 671200 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'para compara: cemento, arenya y ladri... | PY | Paraguay | Concepción | USD | 13.0 | 0 | female | monthly |
| 671201 | 25.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | NaN | KES | 13.0 | 1 | female | monthly |
| 671202 | 0.0 | 25.0 | Games | Entertainment | NaN | KE | Kenya | NaN | KES | 13.0 | 0 | NaN | monthly |
| 671203 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | NaN | KES | 13.0 | 0 | female | monthly |
| 671204 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | NaN | KES | 13.0 | 0 | female | monthly |
24372 rows × 13 columns
How to deal with my duplicate rows
Because we do not have columns such as date , project Id and personal identification details of the borrowers, i will consider that any rows that has th eexact same contents in the columns , and especially in the 'use' column as a duplicate.
I will proceed to delete these columns and i have considered them as a double entry.
#Deleting my duplicated rows and proofing
df.drop_duplicates(inplace=True)
df.duplicated().sum()
0
Column Data Types
df.dtypes
funded_amount float64 loan_amount float64 activity object sector object use object country_code object country object region object currency object term_in_months float64 lender_count int64 borrower_genders object repayment_interval object dtype: object
Null Values
#Checking for columns with null values
df.isnull().sum()
funded_amount 0 loan_amount 0 activity 0 sector 0 use 3899 country_code 8 country 0 region 56163 currency 0 term_in_months 0 lender_count 0 borrower_genders 3888 repayment_interval 0 dtype: int64
df.isnull().sum() * 100 / len(df)
funded_amount 0.000000 loan_amount 0.000000 activity 0.000000 sector 0.000000 use 0.602783 country_code 0.001237 country 0.000000 region 8.682767 currency 0.000000 term_in_months 0.000000 lender_count 0.000000 borrower_genders 0.601083 repayment_interval 0.000000 dtype: float64
Indepth analysis of columns with null values
Use Column
#Overview of use column with null values
df.loc[(df.use.isnull()),:]
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 140 | 2975.0 | 2975.0 | Food Production/Sales | Food | NaN | TZ | Tanzania | NaN | TZS | 10.0 | 110 | NaN | monthly |
| 145 | 1200.0 | 1200.0 | Personal Expenses | Personal Use | NaN | PE | Peru | NaN | PEN | 20.0 | 44 | NaN | monthly |
| 170 | 4250.0 | 4250.0 | Catering | Food | NaN | TZ | Tanzania | NaN | TZS | 10.0 | 116 | NaN | monthly |
| 412 | 2350.0 | 2350.0 | Beauty Salon | Services | NaN | TZ | Tanzania | NaN | TZS | 10.0 | 75 | NaN | monthly |
| 414 | 725.0 | 725.0 | Agriculture | Agriculture | NaN | SV | El Salvador | NaN | USD | 20.0 | 19 | NaN | monthly |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 659604 | 5625.0 | 10000.0 | Weaving | Arts | NaN | BT | Bhutan | NaN | USD | 14.0 | 210 | NaN | irregular |
| 660788 | 1975.0 | 1975.0 | Home Energy | Personal Use | NaN | PS | Palestine | NaN | USD | 27.0 | 39 | NaN | monthly |
| 661718 | 800.0 | 1600.0 | Furniture Making | Manufacturing | NaN | HT | Haiti | NaN | HTG | 13.0 | 27 | NaN | irregular |
| 671151 | 0.0 | 25.0 | Livestock | Agriculture | NaN | KE | Kenya | NaN | KES | 13.0 | 0 | NaN | monthly |
| 671174 | 0.0 | 25.0 | Games | Entertainment | NaN | KE | Kenya | NaN | KES | 13.0 | 0 | NaN | monthly |
3899 rows × 13 columns
Gender Column
df.loc[(df.borrower_genders.isnull()),:]
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 140 | 2975.0 | 2975.0 | Food Production/Sales | Food | NaN | TZ | Tanzania | NaN | TZS | 10.0 | 110 | NaN | monthly |
| 145 | 1200.0 | 1200.0 | Personal Expenses | Personal Use | NaN | PE | Peru | NaN | PEN | 20.0 | 44 | NaN | monthly |
| 170 | 4250.0 | 4250.0 | Catering | Food | NaN | TZ | Tanzania | NaN | TZS | 10.0 | 116 | NaN | monthly |
| 412 | 2350.0 | 2350.0 | Beauty Salon | Services | NaN | TZ | Tanzania | NaN | TZS | 10.0 | 75 | NaN | monthly |
| 414 | 725.0 | 725.0 | Agriculture | Agriculture | NaN | SV | El Salvador | NaN | USD | 20.0 | 19 | NaN | monthly |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 659604 | 5625.0 | 10000.0 | Weaving | Arts | NaN | BT | Bhutan | NaN | USD | 14.0 | 210 | NaN | irregular |
| 660788 | 1975.0 | 1975.0 | Home Energy | Personal Use | NaN | PS | Palestine | NaN | USD | 27.0 | 39 | NaN | monthly |
| 661718 | 800.0 | 1600.0 | Furniture Making | Manufacturing | NaN | HT | Haiti | NaN | HTG | 13.0 | 27 | NaN | irregular |
| 671151 | 0.0 | 25.0 | Livestock | Agriculture | NaN | KE | Kenya | NaN | KES | 13.0 | 0 | NaN | monthly |
| 671174 | 0.0 | 25.0 | Games | Entertainment | NaN | KE | Kenya | NaN | KES | 13.0 | 0 | NaN | monthly |
3888 rows × 13 columns
# Does a null value in Gender column imply a null in Region and Use column?
df.loc[(df.borrower_genders.isnull()) & (df.region.notnull()),:]
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval |
|---|
df.loc[(df.borrower_genders.isnull()) & (df.use.notnull()),:]
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval |
|---|
# In how many countries do we have a null value for gender
missing_gender = df.loc[(df.borrower_genders.isnull()), "country"].nunique()
missing_gender
69
#How many countries are there in our data set
df.country.nunique()
87
Region Column
df.loc[(df.region.isnull()),:]
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | 250.0 | 250.0 | Services | Services | purchase leather for my business using ksh 20000. | KE | Kenya | NaN | KES | 4.0 | 6 | female | irregular |
| 49 | 450.0 | 450.0 | General Store | Retail | to stock his store. | SV | El Salvador | NaN | USD | 14.0 | 18 | male | monthly |
| 54 | 225.0 | 225.0 | Food Market | Food | to purchase various seasonal items to resell: ... | SN | Senegal | NaN | XOF | 14.0 | 7 | female | monthly |
| 67 | 125.0 | 125.0 | Energy | Services | purchase solar lanterns for resale. | KE | Kenya | NaN | KES | 3.0 | 6 | male | irregular |
| 70 | 2000.0 | 2000.0 | Retail | Retail | to install a display window and a sunshade for... | IQ | Iraq | NaN | USD | 15.0 | 71 | male | monthly |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 671166 | 25.0 | 25.0 | Livestock | Agriculture | [True, u'to start a turducken farm.'] - this l... | KE | Kenya | NaN | KES | 13.0 | 1 | female | monthly |
| 671167 | 0.0 | 25.0 | Livestock | Agriculture | Kiva Coordinator fixed issue loan (no longer v... | KE | Kenya | NaN | KES | 13.0 | 0 | female, female | monthly |
| 671168 | 0.0 | 25.0 | Livestock | Agriculture | Pretend the issue with loan got addressed by K... | KE | Kenya | NaN | KES | 13.0 | 0 | female | monthly |
| 671171 | 0.0 | 25.0 | Games | Entertainment | Kiva Coordinator replaced loan use. Should see... | KE | Kenya | NaN | KES | 13.0 | 0 | female, female | monthly |
| 671174 | 0.0 | 25.0 | Games | Entertainment | NaN | KE | Kenya | NaN | KES | 13.0 | 0 | NaN | monthly |
56163 rows × 13 columns
#how many countries have a null value for region
missing_region = df.loc[(df.region.isnull()), "country"].nunique()
missing_region
72
Country code column
df.loc[(df.country_code.isnull()),:]
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 202537 | 4150.0 | 4150.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | EEnhana | NAD | 6.0 | 162 | female | bullet |
| 202823 | 4150.0 | 4150.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | Rundu | NAD | 6.0 | 159 | male | bullet |
| 344929 | 3325.0 | 3325.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | EEnhana | NAD | 7.0 | 120 | female | bullet |
| 351177 | 3325.0 | 3325.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | Rundu | NAD | 7.0 | 126 | male | bullet |
| 420953 | 3325.0 | 3325.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | EEnhana | NAD | 7.0 | 118 | female | bullet |
| 421218 | 4000.0 | 4000.0 | Wholesale | Wholesale | purchase solar lighting products for sale to l... | NaN | Namibia | Rundu | NAD | 7.0 | 150 | male | bullet |
| 487207 | 5100.0 | 5100.0 | Renewable Energy Products | Retail | to pay for stock of solar lights and cell phon... | NaN | Namibia | Katima Mulilo | NAD | 7.0 | 183 | male | bullet |
| 487653 | 5000.0 | 5000.0 | Wholesale | Wholesale | to maintain a stock of solar lights and cell p... | NaN | Namibia | Oshakati | NAD | 7.0 | 183 | female | bullet |
#I want to find out if there are other applications from Namibia that have a country code imputed
df.loc[(df.country=="Namibia"),:]
#There are no other rows
| funded_amount | loan_amount | activity | sector | use | country_code | country | region | currency | term_in_months | lender_count | borrower_genders | repayment_interval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 202537 | 4150.0 | 4150.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | EEnhana | NAD | 6.0 | 162 | female | bullet |
| 202823 | 4150.0 | 4150.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | Rundu | NAD | 6.0 | 159 | male | bullet |
| 344929 | 3325.0 | 3325.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | EEnhana | NAD | 7.0 | 120 | female | bullet |
| 351177 | 3325.0 | 3325.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | Rundu | NAD | 7.0 | 126 | male | bullet |
| 420953 | 3325.0 | 3325.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NaN | Namibia | EEnhana | NAD | 7.0 | 118 | female | bullet |
| 421218 | 4000.0 | 4000.0 | Wholesale | Wholesale | purchase solar lighting products for sale to l... | NaN | Namibia | Rundu | NAD | 7.0 | 150 | male | bullet |
| 487207 | 5100.0 | 5100.0 | Renewable Energy Products | Retail | to pay for stock of solar lights and cell phon... | NaN | Namibia | Katima Mulilo | NAD | 7.0 | 183 | male | bullet |
| 487653 | 5000.0 | 5000.0 | Wholesale | Wholesale | to maintain a stock of solar lights and cell p... | NaN | Namibia | Oshakati | NAD | 7.0 | 183 | female | bullet |
Our findings on Null Columns and the way forward.
From our investigation with the columns that have null values, the following was found:
Use Column
Therefore i will assume that the lack of values in this column is as result of negligence. Perharps the applicants forgot to state their specific uses, or the data base administrator ommitted it.
I will not delete these rows because funds were still provided for these applications and these rows may still aid in our analysis.
I will replace the null values with "not_provided"
Gender Column
I will also assume a case of negligence in this column.
Because loans were still provided to applicants with no gender given i will not delete these rows.
However i will replace the null values with the modal value of the gender column.
Region Column
I will not delete the rows as it constitute a moderate part of our data set. However i will proceed to delete the whole column.
I believe that the country column is necessary enough.
Country_code Column
I will not delete these rows as these are the only rows or applications from Namibia. I will proceed to replace the value with "Nam" which is the universal country code for Namimbia. I will therefore conclude that this was an entry error. "Nan" was written instead of "Nam".
Update I later on realised that this data set follows the Aplha two code (NA) instead of the alpha three code (NAM). I later on changed the code from"Nam" to "NA"
1.Replacing null values in Use column with "not_provided"
df["use"].fillna("not_provided", inplace = True)
2.Replace Null values in gender with the mode of the gender column
df['borrower_genders'] = df['borrower_genders'].fillna(df['borrower_genders'].mode()[0])
3.Deleting the Region column
df.drop(columns=['region'],axis=7, inplace=True)
4.Replacing null values in county_code columnn with "Nam"
df["country_code"].fillna("Nam", inplace = True)
# Realised that this data set takes the alpha 2 code instead of the alpha 3 code (ISO norms), so will replace "Nam" with "NA"
df.loc[(df.country_code=="Nam"),"country_code"] ="NA"
#Proofing
df.loc[(df.country=="Namibia"),:]
| funded_amount | loan_amount | activity | sector | use | country_code | country | currency | term_in_months | lender_count | borrower_genders | repayment_interval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 202537 | 4150.0 | 4150.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NA | Namibia | NAD | 6.0 | 162 | female | bullet |
| 202823 | 4150.0 | 4150.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NA | Namibia | NAD | 6.0 | 159 | male | bullet |
| 344929 | 3325.0 | 3325.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NA | Namibia | NAD | 7.0 | 120 | female | bullet |
| 351177 | 3325.0 | 3325.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NA | Namibia | NAD | 7.0 | 126 | male | bullet |
| 420953 | 3325.0 | 3325.0 | Wholesale | Wholesale | To purchase lighting products for sale to loca... | NA | Namibia | NAD | 7.0 | 118 | female | bullet |
| 421218 | 4000.0 | 4000.0 | Wholesale | Wholesale | purchase solar lighting products for sale to l... | NA | Namibia | NAD | 7.0 | 150 | male | bullet |
| 487207 | 5100.0 | 5100.0 | Renewable Energy Products | Retail | to pay for stock of solar lights and cell phon... | NA | Namibia | NAD | 7.0 | 183 | male | bullet |
| 487653 | 5000.0 | 5000.0 | Wholesale | Wholesale | to maintain a stock of solar lights and cell p... | NA | Namibia | NAD | 7.0 | 183 | female | bullet |
#Proofing Null values
df.isnull().sum()
funded_amount 0 loan_amount 0 activity 0 sector 0 use 0 country_code 0 country 0 currency 0 term_in_months 0 lender_count 0 borrower_genders 0 repayment_interval 0 dtype: int64
#Doing some final checks
df.repayment_interval.unique()
array(['irregular', 'bullet', 'monthly', 'weekly'], dtype=object)
I will be creating some extra columns which i may or may not use for my analyis. However i think it is important to get some more insights from this data set:
I noticed that the duration of payment can range from one month to 158 months. It will be interesting to know if this is dependent on any other factor found in our data set. The website does not specify what the maximum disbursement duration is and if this depends on the sector or the amount of capital asked.
I will be dividing the values of the duration column in four categories: short_term_loan,medium_term_loan,long_term_loan,extra_long_term_loan.The categorization of these loans were done according to finance principles give and take and taken into consideration our Busienss Case.
a. short term 1mo-24mo
b. medium term 25mo-60mo
c. long term 61mo-120mo
d. extra long term 121mo and above
This new column will tell us the rate at which requested amounts (loan_amount) are being financed(funded_amount). To calculte the values of this column i will use teh followingg formular
funded amount/loan amount *100
Here i will divide the the funded_loan column between loans that were funded (percentage of funding) and not funded. This column will bbe built based on the results of the rate of financing column.
a. Not funded 0
b. Very poor funding 1-30
c. Poor Funding 30-49
d. Avg Funding 50-80
e. Good Funding 81-99
f. Fully Funded 100
1.term_category column
#min and max values for this column
df.term_in_months.max()
158.0
df.term_in_months.min()
1.0
df.term_in_months.unique()
array([ 12., 11., 43., 14., 4., 13., 10., 8., 5., 20., 7.,
3., 17., 15., 9., 39., 23., 26., 6., 30., 22., 25.,
27., 16., 52., 18., 36., 19., 28., 62., 32., 24., 21.,
50., 38., 2., 35., 72., 137., 1., 49., 33., 42., 29.,
37., 51., 113., 79., 31., 44., 74., 34., 48., 114., 40.,
124., 104., 63., 85., 78., 70., 86., 61., 60., 67., 55.,
53., 41., 68., 143., 77., 130., 45., 111., 134., 107., 142.,
148., 56., 122., 133., 141., 110., 81., 106., 54., 147., 112.,
59., 145., 121., 109., 80., 47., 97., 75., 101., 128., 98.,
87., 71., 66., 46., 125., 76., 73., 120., 144., 118., 131.,
65., 108., 58., 123., 84., 99., 82., 92., 69., 91., 57.,
90., 93., 129., 89., 88., 64., 126., 138., 158., 83., 100.,
105., 132., 96., 127., 135., 95., 154., 156., 94., 115., 102.,
116., 136., 103., 139., 146.])
#creating categories for this column
category = ["short_term_loan", "medium_term_loan", "long_term_loan","extra_long_term_loan"]
max_value = df.loc[:,"term_in_months"].max()
min_value = df.loc[:,"term_in_months"].min()
term_range = [min_value - .001 * abs(min_value), 25, 61, 121,max_value + .001 * abs(max_value)]
df["term_cat"] = pd.cut(x=df.loc[:,"term_in_months"],
bins=term_range,
labels=category
)
df["term_label"] = pd.cut(x=df.loc[:,"term_in_months"],
bins=term_range,
)
df.head(2)
| funded_amount | loan_amount | activity | sector | use | country_code | country | currency | term_in_months | lender_count | borrower_genders | repayment_interval | term_cat | term_label | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 300.0 | 300.0 | Fruits & Vegetables | Food | To buy seasonal, fresh fruits to sell. | PK | Pakistan | PKR | 12.0 | 12 | female | irregular | short_term_loan | (0.999, 25.0] |
| 1 | 575.0 | 575.0 | Rickshaw | Transportation | to repair and maintain the auto rickshaw used ... | PK | Pakistan | PKR | 11.0 | 14 | female, female | irregular | short_term_loan | (0.999, 25.0] |
2. rate_of_financing(%) column
df["rate_of_financing(%)"]=round((df['funded_amount']/df['loan_amount'])*100,2)
df.head(2)
| funded_amount | loan_amount | activity | sector | use | country_code | country | currency | term_in_months | lender_count | borrower_genders | repayment_interval | term_cat | term_label | rate_of_financing(%) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 300.0 | 300.0 | Fruits & Vegetables | Food | To buy seasonal, fresh fruits to sell. | PK | Pakistan | PKR | 12.0 | 12 | female | irregular | short_term_loan | (0.999, 25.0] | 100.0 |
| 1 | 575.0 | 575.0 | Rickshaw | Transportation | to repair and maintain the auto rickshaw used ... | PK | Pakistan | PKR | 11.0 | 14 | female, female | irregular | short_term_loan | (0.999, 25.0] | 100.0 |
3.loan_status column
conditions = [
(df['rate_of_financing(%)'] == 0),
(df['rate_of_financing(%)'] > 0) & (df['rate_of_financing(%)'] <= 30),
(df['rate_of_financing(%)'] > 30) & (df['rate_of_financing(%)'] <= 49),
(df['rate_of_financing(%)'] > 49) & (df['rate_of_financing(%)'] <= 80),
(df['rate_of_financing(%)'] > 80) & (df['rate_of_financing(%)'] < 100),
(df['rate_of_financing(%)'] == 100 ),
(df['rate_of_financing(%)'] > 100 )
]
choices = ["not_funded","very_poor_funding","poor_funding" ,"avg_funding", "good_funding","fully_funded","over_funded"]
df['loan_status'] = np.select(conditions,choices, default=0)
df.head(2)
| funded_amount | loan_amount | activity | sector | use | country_code | country | currency | term_in_months | lender_count | borrower_genders | repayment_interval | term_cat | term_label | rate_of_financing(%) | loan_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 300.0 | 300.0 | Fruits & Vegetables | Food | To buy seasonal, fresh fruits to sell. | PK | Pakistan | PKR | 12.0 | 12 | female | irregular | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
| 1 | 575.0 | 575.0 | Rickshaw | Transportation | to repair and maintain the auto rickshaw used ... | PK | Pakistan | PKR | 11.0 | 14 | female, female | irregular | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
In this subsection we are going to check our data types again and make sure that they are in a format that can optimize data storage.
df.dtypes
funded_amount float64 loan_amount float64 activity object sector object use object country_code object country object currency object term_in_months float64 lender_count int64 borrower_genders object repayment_interval object term_cat category term_label category rate_of_financing(%) float64 loan_status object dtype: object
#Changing the columns below to category type since the values in this column are repetitive
df_1=df.copy()
list_category = ['activity','sector','country_code','country','repayment_interval','loan_status','currency']
for column in list_category:
df_1.loc[:,column] = df.loc[:,column].astype('category')
df.term_in_months.unique()
array([ 12., 11., 43., 14., 4., 13., 10., 8., 5., 20., 7.,
3., 17., 15., 9., 39., 23., 26., 6., 30., 22., 25.,
27., 16., 52., 18., 36., 19., 28., 62., 32., 24., 21.,
50., 38., 2., 35., 72., 137., 1., 49., 33., 42., 29.,
37., 51., 113., 79., 31., 44., 74., 34., 48., 114., 40.,
124., 104., 63., 85., 78., 70., 86., 61., 60., 67., 55.,
53., 41., 68., 143., 77., 130., 45., 111., 134., 107., 142.,
148., 56., 122., 133., 141., 110., 81., 106., 54., 147., 112.,
59., 145., 121., 109., 80., 47., 97., 75., 101., 128., 98.,
87., 71., 66., 46., 125., 76., 73., 120., 144., 118., 131.,
65., 108., 58., 123., 84., 99., 82., 92., 69., 91., 57.,
90., 93., 129., 89., 88., 64., 126., 138., 158., 83., 100.,
105., 132., 96., 127., 135., 95., 154., 156., 94., 115., 102.,
116., 136., 103., 139., 146.])
#the term_in_months column has only whole numbers as values. So i will change it to interger
df_1['term_in_months'] = df_1['term_in_months'].astype(int)
#Proofing
df_1.dtypes
funded_amount float64 loan_amount float64 activity category sector category use object country_code category country category currency category term_in_months int32 lender_count int64 borrower_genders object repayment_interval category term_cat category term_label category rate_of_financing(%) float64 loan_status category dtype: object
In this section i will be analysing my dataset first as a whole an dwill then dig further to analyse based on sections depending on what i find in my original analysis. I will be using data aggregation and data viz to aid in my analysis.
df_f = df.groupby(by='loan_status',as_index=False).agg(count_cat=('loan_status','count')).sort_values(['count_cat'],ascending=False)
df_f
| loan_status | count_cat | |
|---|---|---|
| 1 | fully_funded | 598796 |
| 0 | avg_funding | 17808 |
| 6 | very_poor_funding | 11878 |
| 5 | poor_funding | 9946 |
| 2 | good_funding | 5125 |
| 3 | not_funded | 3278 |
| 4 | over_funded | 2 |
import plotly.express as px
fig = px.pie(df_f,
values='count_cat',
names='loan_status',
title= "Fig 1.0 : Project funding KIVA 2016"
)
fig.show()
Seeing as a very huge part of the applications were fully funded (92.6%), it means that the Kiva Organization has a good funding rate. It would however be intersting to see why some loans were not funded and if they share anything in common.
df.loc[df["loan_status"]== "not_funded",:]
| funded_amount | loan_amount | activity | sector | use | country_code | country | currency | term_in_months | lender_count | borrower_genders | repayment_interval | term_cat | term_label | rate_of_financing(%) | loan_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4814 | 0.0 | 5000.0 | Food Production/Sales | Food | grow and create new jobs in my company. | US | United States | USD | 24.0 | 0 | female | bullet | short_term_loan | (0.999, 25.0] | 0.0 | not_funded |
| 6784 | 0.0 | 10000.0 | Cosmetics Sales | Retail | start a new stream of income by offering retai... | US | United States | USD | 36.0 | 0 | male | bullet | medium_term_loan | (25.0, 61.0] | 0.0 | not_funded |
| 10735 | 0.0 | 5000.0 | Services | Services | purchase new equipment and market to more pote... | US | United States | USD | 24.0 | 0 | male | bullet | short_term_loan | (0.999, 25.0] | 0.0 | not_funded |
| 12037 | 0.0 | 5000.0 | Food Production/Sales | Food | purchase much needed packaging that is sustain... | US | United States | USD | 24.0 | 0 | male | bullet | short_term_loan | (0.999, 25.0] | 0.0 | not_funded |
| 13542 | 0.0 | 1500.0 | Crafts | Arts | increase my inventory of handmade candles, dri... | US | United States | USD | 12.0 | 0 | male | bullet | short_term_loan | (0.999, 25.0] | 0.0 | not_funded |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 671179 | 0.0 | 25.0 | Livestock | Agriculture | [True, u'para compara: cemento, arenya y ladri... | PY | Paraguay | USD | 13.0 | 0 | female | monthly | short_term_loan | (0.999, 25.0] | 0.0 | not_funded |
| 671181 | 0.0 | 25.0 | Livestock | Agriculture | Reviewed loan use in english. | PK | Pakistan | PKR | 13.0 | 0 | female | monthly | short_term_loan | (0.999, 25.0] | 0.0 | not_funded |
| 671182 | 0.0 | 125.0 | Livestock | Agriculture | Pretend the flagged issue was addressed by KC. | MX | Mexico | MXN | 13.0 | 0 | female, female | monthly | short_term_loan | (0.999, 25.0] | 0.0 | not_funded |
| 671184 | 0.0 | 875.0 | Livestock | Agriculture | Translated loan use to english. | BO | Bolivia | BOB | 13.0 | 0 | female, female | monthly | short_term_loan | (0.999, 25.0] | 0.0 | not_funded |
| 671188 | 0.0 | 250.0 | Livestock | Agriculture | Reviewed loan use in english. | GH | Ghana | GHS | 13.0 | 0 | female | monthly | short_term_loan | (0.999, 25.0] | 0.0 | not_funded |
3278 rows × 16 columns
Note
From the above table i see that perharps the use could have a part to play and maybe where the applications came from? I will dig deeper to see if my assumptions are correct.
#wheh a loan is a fake or when the loan was not originally applied for in english does it affect its funding ?
df.loc[df["use"].str.startswith('Pretend') | df["use"].str.startswith('Kiva')|df["use"].str.startswith('Translated')|df["use"].str.startswith('Reviewed') ,:]
| funded_amount | loan_amount | activity | sector | use | country_code | country | currency | term_in_months | lender_count | borrower_genders | repayment_interval | term_cat | term_label | rate_of_financing(%) | loan_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 657556 | 0.0 | 1000.0 | Tourism | Services | Reviewed loan use in english. | KH | Cambodia | KHR | 12.0 | 0 | female, female | monthly | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
| 657584 | 0.0 | 700.0 | Tourism | Services | Reviewed loan use in english. | KH | Cambodia | KHR | 9.0 | 0 | female, female | monthly | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
| 657717 | 0.0 | 1975.0 | Tourism | Services | Reviewed loan use in english. | KH | Cambodia | KHR | 25.0 | 0 | female, female, female, female | monthly | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
| 658107 | 0.0 | 925.0 | Tourism | Services | Reviewed loan use in english. | KH | Cambodia | KHR | 11.0 | 0 | female, female | monthly | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
| 658437 | 0.0 | 2075.0 | Tourism | Services | Reviewed loan use in english. | KH | Cambodia | KHR | 26.0 | 0 | female, female, female | monthly | medium_term_loan | (25.0, 61.0] | 0.00 | not_funded |
| 659111 | 0.0 | 1000.0 | Tourism | Services | Reviewed loan use in english. | KH | Cambodia | KHR | 20.0 | 0 | female, female | monthly | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
| 661362 | 0.0 | 625.0 | Transportation | Transportation | Translated loan use to english. | PE | Peru | PEN | 8.0 | 0 | male | monthly | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
| 661679 | 0.0 | 1550.0 | Dental | Health | Translated loan use to english. | PE | Peru | PEN | 8.0 | 0 | male | monthly | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
| 661941 | 0.0 | 925.0 | Tourism | Services | Translated loan use to english. | PE | Peru | PEN | 8.0 | 0 | male | monthly | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
| 661956 | 0.0 | 1550.0 | Restaurant | Food | Translated loan use to english. | PE | Peru | PEN | 8.0 | 0 | female | monthly | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
| 662377 | 0.0 | 6425.0 | Sewing | Services | Translated loan use to english. | PE | Peru | PEN | 6.0 | 0 | male, female, male, male, male, male, female, ... | irregular | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
| 662552 | 0.0 | 6575.0 | Grocery Store | Food | Translated loan use to english. | PE | Peru | PEN | 8.0 | 0 | male, male, female, female, female, female, fe... | monthly | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
| 662746 | 0.0 | 625.0 | Cosmetics Sales | Retail | Translated loan use to english. | PE | Peru | PEN | 8.0 | 0 | female | monthly | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
| 662793 | 0.0 | 625.0 | Electronics Sales | Retail | Translated loan use to english. | PE | Peru | PEN | 8.0 | 0 | male | monthly | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
| 662846 | 0.0 | 1225.0 | Tourism | Services | Translated loan use to english. | PE | Peru | PEN | 8.0 | 0 | male | monthly | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
| 662849 | 0.0 | 775.0 | Clothing Sales | Clothing | Translated loan use to english. | PE | Peru | PEN | 8.0 | 0 | female | monthly | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
| 663601 | 0.0 | 5525.0 | Retail | Retail | Translated loan use to english. | PE | Peru | PEN | 7.0 | 0 | male, female, female, female, female, female, ... | monthly | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
| 663824 | 0.0 | 1075.0 | Home Appliances | Personal Use | Translated loan use to english. | PE | Peru | PEN | 6.0 | 0 | female | irregular | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
| 663855 | 0.0 | 1700.0 | Transportation | Transportation | Translated loan use to english. | PE | Peru | PEN | 8.0 | 0 | female, female, female, female, male | monthly | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
| 664078 | 0.0 | 1250.0 | Retail | Retail | Translated loan use to english. | PE | Peru | PEN | 6.0 | 0 | female | irregular | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
| 670969 | 0.0 | 675.0 | Retail | Retail | Translated loan use to english. | CO | Colombia | COP | 26.0 | 0 | female | bullet | medium_term_loan | (25.0, 61.0] | 0.00 | not_funded |
| 671027 | 0.0 | 300.0 | Cattle | Agriculture | Translated loan use to english. | SV | El Salvador | USD | 15.0 | 0 | male | bullet | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
| 671092 | 0.0 | 1000.0 | Cattle | Agriculture | Translated loan use to english. | EC | Ecuador | USD | 38.0 | 0 | female | monthly | medium_term_loan | (25.0, 61.0] | 0.00 | not_funded |
| 671150 | 0.0 | 75.0 | Livestock | Agriculture | Pretend the issue with spanish loan was addres... | MX | Mexico | MXN | 13.0 | 0 | female | monthly | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
| 671152 | 25.0 | 25.0 | Livestock | Agriculture | Reviewed loan use in english. | PK | Pakistan | PKR | 13.0 | 1 | female | monthly | short_term_loan | (0.999, 25.0] | 100.00 | fully_funded |
| 671153 | 75.0 | 125.0 | Livestock | Agriculture | Pretend the flagged issue was addressed by KC. | MX | Mexico | MXN | 13.0 | 3 | female, female | monthly | short_term_loan | (0.999, 25.0] | 60.00 | avg_funding |
| 671154 | 0.0 | 725.0 | Livestock | Agriculture | Translated loan use to english. | BO | Bolivia | BOB | 13.0 | 0 | female | monthly | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
| 671155 | 50.0 | 875.0 | Livestock | Agriculture | Translated loan use to english. | BO | Bolivia | BOB | 13.0 | 2 | female, female | monthly | short_term_loan | (0.999, 25.0] | 5.71 | very_poor_funding |
| 671156 | 75.0 | 250.0 | Livestock | Agriculture | Reviewed loan use in english. | GH | Ghana | GHS | 13.0 | 3 | female | monthly | short_term_loan | (0.999, 25.0] | 30.00 | very_poor_funding |
| 671157 | 75.0 | 75.0 | Livestock | Agriculture | Pretend the issue with spanish loan was addres... | MX | Mexico | MXN | 13.0 | 3 | female | monthly | short_term_loan | (0.999, 25.0] | 100.00 | fully_funded |
| 671159 | 50.0 | 725.0 | Livestock | Agriculture | Translated loan use to english. | BO | Bolivia | BOB | 13.0 | 2 | female | monthly | short_term_loan | (0.999, 25.0] | 6.90 | very_poor_funding |
| 671167 | 0.0 | 25.0 | Livestock | Agriculture | Kiva Coordinator fixed issue loan (no longer v... | KE | Kenya | KES | 13.0 | 0 | female, female | monthly | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
| 671168 | 0.0 | 25.0 | Livestock | Agriculture | Pretend the issue with loan got addressed by K... | KE | Kenya | KES | 13.0 | 0 | female | monthly | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
| 671171 | 0.0 | 25.0 | Games | Entertainment | Kiva Coordinator replaced loan use. Should see... | KE | Kenya | KES | 13.0 | 0 | female, female | monthly | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
| 671181 | 0.0 | 25.0 | Livestock | Agriculture | Reviewed loan use in english. | PK | Pakistan | PKR | 13.0 | 0 | female | monthly | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
| 671182 | 0.0 | 125.0 | Livestock | Agriculture | Pretend the flagged issue was addressed by KC. | MX | Mexico | MXN | 13.0 | 0 | female, female | monthly | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
| 671184 | 0.0 | 875.0 | Livestock | Agriculture | Translated loan use to english. | BO | Bolivia | BOB | 13.0 | 0 | female, female | monthly | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
| 671188 | 0.0 | 250.0 | Livestock | Agriculture | Reviewed loan use in english. | GH | Ghana | GHS | 13.0 | 0 | female | monthly | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
df.loc[(df.use == 'not_provided') & (df.funded_amount== 0),:]
| funded_amount | loan_amount | activity | sector | use | country_code | country | currency | term_in_months | lender_count | borrower_genders | repayment_interval | term_cat | term_label | rate_of_financing(%) | loan_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 36075 | 0.0 | 600.0 | Personal Medical Expenses | Health | not_provided | BO | Bolivia | USD | 20.0 | 0 | female | monthly | short_term_loan | (0.999, 25.0] | 0.0 | not_funded |
| 40489 | 0.0 | 1050.0 | Cloth & Dressmaking Supplies | Retail | not_provided | KE | Kenya | KES | 14.0 | 0 | female | monthly | short_term_loan | (0.999, 25.0] | 0.0 | not_funded |
| 53392 | 0.0 | 1025.0 | Retail | Retail | not_provided | NI | Nicaragua | USD | 17.0 | 0 | female | monthly | short_term_loan | (0.999, 25.0] | 0.0 | not_funded |
| 59399 | 0.0 | 325.0 | Agriculture | Agriculture | not_provided | SV | El Salvador | USD | 13.0 | 0 | female | bullet | short_term_loan | (0.999, 25.0] | 0.0 | not_funded |
| 61296 | 0.0 | 500.0 | Farming | Agriculture | not_provided | SV | El Salvador | USD | 14.0 | 0 | female | irregular | short_term_loan | (0.999, 25.0] | 0.0 | not_funded |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 643472 | 0.0 | 425.0 | Farming | Agriculture | not_provided | UG | Uganda | UGX | 12.0 | 0 | female | monthly | short_term_loan | (0.999, 25.0] | 0.0 | not_funded |
| 646452 | 0.0 | 3000.0 | Clothing | Clothing | not_provided | US | United States | USD | 24.0 | 0 | female | bullet | short_term_loan | (0.999, 25.0] | 0.0 | not_funded |
| 647114 | 0.0 | 5000.0 | Education provider | Education | not_provided | US | United States | USD | 18.0 | 0 | female | bullet | short_term_loan | (0.999, 25.0] | 0.0 | not_funded |
| 671151 | 0.0 | 25.0 | Livestock | Agriculture | not_provided | KE | Kenya | KES | 13.0 | 0 | female | monthly | short_term_loan | (0.999, 25.0] | 0.0 | not_funded |
| 671174 | 0.0 | 25.0 | Games | Entertainment | not_provided | KE | Kenya | KES | 13.0 | 0 | female | monthly | short_term_loan | (0.999, 25.0] | 0.0 | not_funded |
115 rows × 16 columns
df.loc[(df.use == 'not_provided') ,:]
| funded_amount | loan_amount | activity | sector | use | country_code | country | currency | term_in_months | lender_count | borrower_genders | repayment_interval | term_cat | term_label | rate_of_financing(%) | loan_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 140 | 2975.0 | 2975.0 | Food Production/Sales | Food | not_provided | TZ | Tanzania | TZS | 10.0 | 110 | female | monthly | short_term_loan | (0.999, 25.0] | 100.00 | fully_funded |
| 145 | 1200.0 | 1200.0 | Personal Expenses | Personal Use | not_provided | PE | Peru | PEN | 20.0 | 44 | female | monthly | short_term_loan | (0.999, 25.0] | 100.00 | fully_funded |
| 170 | 4250.0 | 4250.0 | Catering | Food | not_provided | TZ | Tanzania | TZS | 10.0 | 116 | female | monthly | short_term_loan | (0.999, 25.0] | 100.00 | fully_funded |
| 412 | 2350.0 | 2350.0 | Beauty Salon | Services | not_provided | TZ | Tanzania | TZS | 10.0 | 75 | female | monthly | short_term_loan | (0.999, 25.0] | 100.00 | fully_funded |
| 414 | 725.0 | 725.0 | Agriculture | Agriculture | not_provided | SV | El Salvador | USD | 20.0 | 19 | female | monthly | short_term_loan | (0.999, 25.0] | 100.00 | fully_funded |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 659604 | 5625.0 | 10000.0 | Weaving | Arts | not_provided | BT | Bhutan | USD | 14.0 | 210 | female | irregular | short_term_loan | (0.999, 25.0] | 56.25 | avg_funding |
| 660788 | 1975.0 | 1975.0 | Home Energy | Personal Use | not_provided | PS | Palestine | USD | 27.0 | 39 | female | monthly | medium_term_loan | (25.0, 61.0] | 100.00 | fully_funded |
| 661718 | 800.0 | 1600.0 | Furniture Making | Manufacturing | not_provided | HT | Haiti | HTG | 13.0 | 27 | female | irregular | short_term_loan | (0.999, 25.0] | 50.00 | avg_funding |
| 671151 | 0.0 | 25.0 | Livestock | Agriculture | not_provided | KE | Kenya | KES | 13.0 | 0 | female | monthly | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
| 671174 | 0.0 | 25.0 | Games | Entertainment | not_provided | KE | Kenya | KES | 13.0 | 0 | female | monthly | short_term_loan | (0.999, 25.0] | 0.00 | not_funded |
3899 rows × 16 columns
Note
Just about 3 % of of applications with no use were not funded.
#What sectors are fully_funded
df_7 = df.groupby('sector')['loan_status'].apply(lambda x: (x=='fully_funded').sum()).reset_index(name='count').sort_values(['count'],ascending=False)
df_7.head(5)
| sector | count | |
|---|---|---|
| 0 | Agriculture | 160999 |
| 6 | Food | 127300 |
| 11 | Retail | 112921 |
| 12 | Services | 40966 |
| 2 | Clothing | 29572 |
#what sectors are not funded
df_7 = df.groupby('sector')['loan_status'].apply(lambda x: (x=='not_funded').sum()).reset_index(name='count').sort_values(['count'],ascending=False)
df_7.head(5)
| sector | count | |
|---|---|---|
| 6 | Food | 682 |
| 11 | Retail | 617 |
| 0 | Agriculture | 593 |
| 12 | Services | 589 |
| 2 | Clothing | 202 |
#what countries have applications that were not funded
df_7 = df.groupby('country')['loan_status'].apply(lambda x: (x=='not_funded').sum()).reset_index(name='count').sort_values(['count'],ascending=False)
df_7.head(5)
| country | count | |
|---|---|---|
| 80 | United States | 837 |
| 34 | Kenya | 526 |
| 54 | Pakistan | 242 |
| 22 | El Salvador | 223 |
| 11 | Cambodia | 170 |
Findings
From my above analysis the following can be noted
Conclusion
df_c = df.groupby(by='country',as_index=False).agg(number_of_applications=('loan_status','size')).sort_values(['number_of_applications'],ascending=False)
df_c.head(5)
| country | number_of_applications | |
|---|---|---|
| 59 | Philippines | 154323 |
| 34 | Kenya | 73554 |
| 22 | El Salvador | 39845 |
| 11 | Cambodia | 27067 |
| 54 | Pakistan | 26475 |
# PLOT 1 TO BE EVALUATED
fig = px.scatter_geo(df_c, locations="country",
size="number_of_applications",
locationmode = 'country names',
color= 'country',
size_max=30,
title=" Fig 2.0 : Distribution of KIVA Projects accross the world 2016"
)
fig.update_layout(plot_bgcolor="white")
fig.show()
Note
From our area map we see that most of our applications came from South and Latin America, Asia and Africa . Are applications in these regions sector specific? This brings me to my next question. What sector has the most funded in the top three countries.
#Philipines
df_4 = df.groupby('sector')['country'].apply(lambda x: (x=='Philippines').sum()).reset_index(name='count').sort_values(['count'],ascending=False)
df_4.head()
| sector | count | |
|---|---|---|
| 11 | Retail | 52510 |
| 6 | Food | 41864 |
| 0 | Agriculture | 35158 |
| 8 | Housing | 4697 |
| 13 | Transportation | 4547 |
#Kenya
df_5 = df.groupby('sector')['country'].apply(lambda x: (x=='Kenya').sum()).reset_index(name='count').sort_values(['count'],ascending=False)
df_5.head()
| sector | count | |
|---|---|---|
| 0 | Agriculture | 32883 |
| 6 | Food | 13763 |
| 11 | Retail | 10094 |
| 12 | Services | 5408 |
| 2 | Clothing | 4667 |
#El Salvador
df_6 = df.groupby('sector')['country'].apply(lambda x: (x=='El Salvador').sum()).reset_index(name='count').sort_values(['count'],ascending=False)
df_6.head()
| sector | count | |
|---|---|---|
| 0 | Agriculture | 14008 |
| 6 | Food | 8307 |
| 8 | Housing | 5965 |
| 11 | Retail | 5666 |
| 2 | Clothing | 1905 |
Conclusion
From the above tables, I notice that the countries with the highest numbers of applications all fall around the same the sectors. This is proof of what people in these developing areas are in most need of.
There are therefore no irregulaties here and will be moving on with my analysis.
df_s = df.groupby(by='sector',as_index=False).agg(applications=('sector','size')).sort_values(['applications'],ascending=False)
df_s
| sector | applications | |
|---|---|---|
| 0 | Agriculture | 174624 |
| 6 | Food | 135359 |
| 11 | Retail | 123186 |
| 12 | Services | 44844 |
| 2 | Clothing | 32554 |
| 8 | Housing | 30155 |
| 4 | Education | 30128 |
| 10 | Personal Use | 25533 |
| 13 | Transportation | 15392 |
| 1 | Arts | 11938 |
| 7 | Health | 9190 |
| 3 | Construction | 6263 |
| 9 | Manufacturing | 6204 |
| 5 | Entertainment | 829 |
| 14 | Wholesale | 634 |
barplot4 = px.bar(data_frame=df_s,
x="sector",
y="applications",
color="sector",
title="Fig 3.0: Number of Applications per Sector 2016",
template= "plotly_dark",
labels={"applications":"Total number of applications"}
)
barplot4.show()
Note
From the above table it is no doubt that the top three sectors from where we have applications are :Agriculture,Food and Retail as these are the main sources of income and also a main need in the areas with the highest applications. To prove this let us take agriculture and see the top countries that need funding for agriculture.
#Agriculture
df_7 = df.groupby('country')['sector'].apply(lambda x: (x=='Agriculture').sum()).reset_index(name='count').sort_values(['count'],ascending=False)
df_7
| country | count | |
|---|---|---|
| 59 | Philippines | 35158 |
| 34 | Kenya | 32883 |
| 22 | El Salvador | 14008 |
| 11 | Cambodia | 10542 |
| 58 | Peru | 6585 |
| ... | ... | ... |
| 50 | Namibia | 0 |
| 44 | Mauritania | 0 |
| 39 | Lesotho | 0 |
| 37 | Lao People's Democratic Republic | 0 |
| 0 | Afghanistan | 0 |
87 rows × 2 columns
Conclusion
We see our dataset till this point is relatively homogenous and is spread evenly accross the board.
df_cf = df.groupby(by='sector',as_index=False).agg({'funded_amount': np.sum}).sort_values(['funded_amount'],ascending=False)
df_cf.round(2)
| sector | funded_amount | |
|---|---|---|
| 0 | Agriculture | 132669060.0 |
| 6 | Food | 114770835.0 |
| 11 | Retail | 90523555.0 |
| 12 | Services | 42879735.0 |
| 2 | Clothing | 34750195.0 |
| 4 | Education | 29701870.0 |
| 8 | Housing | 20191325.0 |
| 10 | Personal Use | 12692225.0 |
| 1 | Arts | 11778255.0 |
| 13 | Transportation | 9805775.0 |
| 7 | Health | 9159825.0 |
| 3 | Construction | 6320365.0 |
| 9 | Manufacturing | 5355900.0 |
| 5 | Entertainment | 1023095.0 |
| 14 | Wholesale | 918900.0 |
# PLOT 2 TO BE EVALUATED
import plotly.graph_objects as go
from plotly.offline import iplot
trace1 = go.Scatter(
mode='lines+markers',
x = df_cf['sector'],
y = df_cf['funded_amount'],
name="total loan per sector",
marker_color='chocolate'
)
trace2 = go.Bar(
x = df_s['sector'],
y = df_s['applications'],
name="total applications per sector",
yaxis='y2',
marker_color ='darkcyan',
marker_line_width=1.5,
marker_line_color='rgb(8,48,107)',
opacity=0.5
)
data = [trace1, trace2]
layout = go.Layout(
title_text='Fig 4: Applications per sector and total investment 2016',
template= "simple_white",
yaxis=dict(
range = [900000, 140000000],
side = 'right'
),
yaxis2=dict(
overlaying='y',
anchor='y3',
)
)
fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='multiple-axes-double')
Note
From our results above, we can notice that there is no abnormaly here. The sectors with the highest number of applications received the highest funds in 2016.
In order to enquire further i want to find out what sector received the highest investment in one project.This will continue further to find out in what sector were lenders more predisposed to invest?
df_mx = df.groupby(by='sector',as_index=False).agg({'funded_amount': np.max}).sort_values(['funded_amount'],ascending=False)
df_mx.round(2)
| sector | funded_amount | |
|---|---|---|
| 0 | Agriculture | 100000.0 |
| 2 | Clothing | 50000.0 |
| 3 | Construction | 50000.0 |
| 4 | Education | 50000.0 |
| 7 | Health | 50000.0 |
| 9 | Manufacturing | 50000.0 |
| 11 | Retail | 50000.0 |
| 12 | Services | 50000.0 |
| 14 | Wholesale | 50000.0 |
| 1 | Arts | 40000.0 |
| 6 | Food | 35000.0 |
| 13 | Transportation | 35000.0 |
| 8 | Housing | 24450.0 |
| 10 | Personal Use | 13425.0 |
| 5 | Entertainment | 10000.0 |
#Is there any peculiarity with projects that cost 50,000 USD dollars and above?
df.loc[(df.funded_amount >= 50000) ,:]
| funded_amount | loan_amount | activity | sector | use | country_code | country | currency | term_in_months | lender_count | borrower_genders | repayment_interval | term_cat | term_label | rate_of_financing(%) | loan_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 34196 | 50000.0 | 50000.0 | Renewable Energy Products | Retail | to buy and sell Barefoot Power's Solar Lightin... | PE | Peru | USD | 14.0 | 1446 | male | irregular | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
| 43182 | 50000.0 | 50000.0 | Renewable Energy Products | Retail | To buy and sell Barefoot Power's solar lightin... | KE | Kenya | USD | 16.0 | 1491 | male | bullet | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
| 53634 | 50000.0 | 50000.0 | Renewable Energy Products | Retail | To buy and sell Barefoot Power solar lighting. | UG | Uganda | USD | 14.0 | 1581 | female | bullet | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
| 70499 | 100000.0 | 100000.0 | Agriculture | Agriculture | create more than 300 jobs for women and farmer... | HT | Haiti | USD | 75.0 | 2986 | female | irregular | long_term_loan | (61.0, 121.0] | 100.0 | fully_funded |
| 126839 | 50000.0 | 50000.0 | Agriculture | Agriculture | to buy and plant resin producing pine trees. T... | MX | Mexico | USD | 144.0 | 586 | male, male, male, male, male, male, male, female | irregular | extra_long_term_loan | (121.0, 158.158] | 100.0 | fully_funded |
| 163727 | 50000.0 | 50000.0 | Agriculture | Agriculture | to fund its growing loan book and further deve... | KE | Kenya | USD | 38.0 | 1343 | female | bullet | medium_term_loan | (25.0, 61.0] | 100.0 | fully_funded |
| 210975 | 50000.0 | 50000.0 | Agriculture | Agriculture | To work with 17 farming cooperatives to proces... | RW | Rwanda | USD | 8.0 | 1302 | male | bullet | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
| 223120 | 50000.0 | 50000.0 | Higher education costs | Education | to provide loans and career services for the l... | MX | Mexico | USD | 14.0 | 960 | male | irregular | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
| 408295 | 50000.0 | 50000.0 | Clothing | Clothing | to set up a garment social business that will ... | AL | Albania | USD | 97.0 | 1626 | male | monthly | long_term_loan | (61.0, 121.0] | 100.0 | fully_funded |
| 408465 | 50000.0 | 50000.0 | Construction | Construction | not_provided | PE | Peru | USD | 73.0 | 1310 | female | irregular | long_term_loan | (61.0, 121.0] | 100.0 | fully_funded |
| 447374 | 50000.0 | 50000.0 | Agriculture | Agriculture | to increase smallholder farmers’ incomes by bu... | UG | Uganda | USD | 13.0 | 1555 | male | monthly | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
| 490191 | 50000.0 | 50000.0 | Health | Health | To purchase raw materials in order to produce ... | GH | Ghana | USD | 14.0 | 1569 | male | irregular | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
| 492809 | 50000.0 | 50000.0 | Agriculture | Agriculture | to expand weather, farming information and fin... | GH | Ghana | USD | 14.0 | 1481 | male | irregular | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
| 494470 | 50000.0 | 50000.0 | Agriculture | Agriculture | To pay smallholder coffee farmers in rural Ken... | KE | Kenya | USD | 15.0 | 1441 | female | irregular | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
| 496715 | 50000.0 | 50000.0 | Agriculture | Agriculture | to fund the harvest of seeds of 6,000 smallhol... | MG | Madagascar | USD | 22.0 | 1606 | male | irregular | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
| 509048 | 50000.0 | 50000.0 | Poultry | Agriculture | to purchase chicken feed & a delivery vehicle ... | TZ | Tanzania | USD | 14.0 | 1765 | female | irregular | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
| 523634 | 50000.0 | 50000.0 | Health | Health | to mitigate CO2 & household air pollution, whi... | MW | Malawi | USD | 24.0 | 1465 | male | irregular | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
| 523659 | 50000.0 | 50000.0 | Electronics Sales | Retail | to train & equip 200 rural merchants in Mozamb... | MZ | Mozambique | USD | 14.0 | 1410 | female | irregular | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
| 526100 | 50000.0 | 50000.0 | Renewable Energy Products | Retail | to distribute 200+ innovative & affordable pay... | ZM | Zambia | USD | 14.0 | 1550 | male | irregular | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
| 538248 | 50000.0 | 50000.0 | Agriculture | Agriculture | to enable 5,000 additional small-holder farmer... | KE | Kenya | USD | 14.0 | 1589 | male | monthly | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
| 541006 | 50000.0 | 50000.0 | Goods Distribution | Wholesale | to bolster logistics of affordable water distr... | HT | Haiti | USD | 14.0 | 1349 | male | irregular | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
| 544548 | 50000.0 | 50000.0 | Health | Health | to provide community trauma services in South ... | SS | South Sudan | USD | 8.0 | 1609 | female | bullet | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
| 548513 | 50000.0 | 50000.0 | Renewable Energy Products | Retail | to distribute solar home systems throughout ru... | ZW | Zimbabwe | USD | 13.0 | 720 | male | monthly | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
| 563074 | 50000.0 | 50000.0 | Renewable Energy Products | Retail | to provide life-changing clean cookstoves and ... | KE | Kenya | USD | 14.0 | 1402 | female | irregular | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
| 565733 | 50000.0 | 50000.0 | Agriculture | Agriculture | to pay 600 farming families 100% above market ... | EC | Ecuador | USD | 14.0 | 1689 | female | irregular | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
| 583307 | 50000.0 | 50000.0 | Agriculture | Agriculture | to support 800+ farmers by improving their pro... | GT | Guatemala | USD | 20.0 | 1671 | male | monthly | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
| 586970 | 50000.0 | 50000.0 | Renewable Energy Products | Retail | to generate income to over 600 fishermen in Ta... | TZ | Tanzania | USD | 14.0 | 782 | female | irregular | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
| 604502 | 50000.0 | 50000.0 | Agriculture | Agriculture | to add value and jobs to the local economy by ... | BJ | Benin | USD | 12.0 | 1519 | male | irregular | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
| 614869 | 50000.0 | 50000.0 | Furniture Making | Manufacturing | create jobs through environmentally-friendly m... | KE | Kenya | USD | 20.0 | 1094 | male | irregular | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
| 614922 | 50000.0 | 50000.0 | Water Distribution | Services | to set up 13 new clean water businesses in nor... | GH | Ghana | USD | 14.0 | 1688 | female | irregular | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
| 618264 | 50000.0 | 50000.0 | Farming | Agriculture | to provide income opportunities in remote Indo... | ID | Indonesia | USD | 14.0 | 1564 | male | irregular | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
| 621860 | 50000.0 | 50000.0 | Renewable Energy Products | Retail | to distribute 400 pay-as-you-go solar home sys... | KE | Kenya | USD | 14.0 | 1170 | male | irregular | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
| 631904 | 50000.0 | 50000.0 | Agriculture | Agriculture | double cashew nut export output and hire about... | CI | Cote D'Ivoire | USD | 10.0 | 1706 | female | irregular | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded |
Note
From the tables above we see that the biggest project was worth 100,000 USD and in the Agricultural sector. Considerably so,as it was funded towards a big project with the creation about 300 jobs. I will therefore not consider this as an outlier but as an extreme investment.
When i look further down the list and compare the number of lenders per investment per sector, it brings me to an investor analysis and i would like to dig deeper.
In this section,my business question will be:
How much are lenders willing to invest? What is the propensity to invest?
It will be good for Kiva Organization to know what sectors its investors will be willing to put in more money. The Kiva sectores mandates a minimum of 25 USD investment per person. But how much further are investors willing to invest above this amount?
Let us note that this analysis will be based on how much investors are willing to invest per person per sector and not as a group.
Therefore to get an answer to my question, i will create a new column that shows how much money a lender contributed per project. That way i will be able to know averagely how much an investor is willing to pay for a project in a particular sector or in a particular country.
#New column "inv_per_lender"
df["inv_per_lender"]=round((df['funded_amount']/df['lender_count']),2)
df["inv_per_lender"].fillna(0, inplace = True)
df.head(5)
| funded_amount | loan_amount | activity | sector | use | country_code | country | currency | term_in_months | lender_count | borrower_genders | repayment_interval | term_cat | term_label | rate_of_financing(%) | loan_status | inv_per_lender | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 300.0 | 300.0 | Fruits & Vegetables | Food | To buy seasonal, fresh fruits to sell. | PK | Pakistan | PKR | 12.0 | 12 | female | irregular | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded | 25.00 |
| 1 | 575.0 | 575.0 | Rickshaw | Transportation | to repair and maintain the auto rickshaw used ... | PK | Pakistan | PKR | 11.0 | 14 | female, female | irregular | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded | 41.07 |
| 2 | 150.0 | 150.0 | Transportation | Transportation | To repair their old cycle-van and buy another ... | IN | India | INR | 43.0 | 6 | female | bullet | medium_term_loan | (25.0, 61.0] | 100.0 | fully_funded | 25.00 |
| 3 | 200.0 | 200.0 | Embroidery | Arts | to purchase an embroidery machine and a variet... | PK | Pakistan | PKR | 11.0 | 8 | female | irregular | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded | 25.00 |
| 4 | 400.0 | 400.0 | Milk Sales | Food | to purchase one buffalo. | PK | Pakistan | PKR | 14.0 | 16 | female | monthly | short_term_loan | (0.999, 25.0] | 100.0 | fully_funded | 25.00 |
df_is = df.groupby(by='sector',as_index=False).agg(applications_2016=('country', 'size'), funded_2016=('funded_amount', 'sum'), inv_per_lender=('inv_per_lender','mean') )
df_is
| sector | applications_2016 | funded_2016 | inv_per_lender | |
|---|---|---|---|---|
| 0 | Agriculture | 174624 | 132669060.0 | 47.569026 |
| 1 | Arts | 11938 | 11778255.0 | 60.762197 |
| 2 | Clothing | 32554 | 34750195.0 | 78.995450 |
| 3 | Construction | 6263 | 6320365.0 | 56.376962 |
| 4 | Education | 30128 | 29701870.0 | 39.877026 |
| 5 | Entertainment | 829 | 1023095.0 | 48.978299 |
| 6 | Food | 135359 | 114770835.0 | 77.002489 |
| 7 | Health | 9190 | 9159825.0 | 51.929370 |
| 8 | Housing | 30155 | 20191325.0 | 39.647853 |
| 9 | Manufacturing | 6204 | 5355900.0 | 52.163453 |
| 10 | Personal Use | 25533 | 12692225.0 | 42.697106 |
| 11 | Retail | 123186 | 90523555.0 | 77.134103 |
| 12 | Services | 44844 | 42879735.0 | 62.618802 |
| 13 | Transportation | 15392 | 9805775.0 | 53.756249 |
| 14 | Wholesale | 634 | 918900.0 | 46.627666 |
Note
This will be better analysed with the use of a scatter plot.On this plot, the position of a sector will show us the number of applications and the total amount funded in 2016 for that sector. Furtehr more it will show us the sector where lenders are more incited to pay for an investment.
# PLOT 3 TO BE EVALUATED
import plotly.express as px
fig = px.scatter(data_frame=df_is,
x="applications_2016",
y="funded_2016",
color='sector',
size= "inv_per_lender",
size_max= 30,
title= " Fig 5.0 :Propensity to invest per sector KIVA 2016",
template="plotly_white",
labels={"funded_2016":"total invested in 2016", 'applications_2016':"total projects in 2016"}
)
fig.show()
Note
This scatter plot shows the total amount of money invested per sector on the KIVA platform in 2016.
The balls represent the sectors and the shape represent how much an investor is willing to pay for a project in that sector.
The sectors with a higher propensity to invest are Clothing, Retail and food. Is there a reason why Investors will invest more in this sectors? Let's find out.
#Question One
# What is the repayment structure in these sectors (Clothing,food,and retail)? This would help further confirm our analysis
#We want to know why the propensity to invest in higher in this sectors.
df_rs = df.groupby(by=['repayment_interval',"sector"],as_index=False).agg(applications_2016=('sector', 'count'))
df_rs.head(60)
#For lack of time the following calculations were done on paper
#Bullet Form repayment
#agriculture 23%
#clothing 3.1%
#Retail 3.9%
#Food 3.3%
| repayment_interval | sector | applications_2016 | |
|---|---|---|---|
| 0 | bullet | Agriculture | 41050 |
| 1 | bullet | Arts | 1454 |
| 2 | bullet | Clothing | 1037 |
| 3 | bullet | Construction | 293 |
| 4 | bullet | Education | 2156 |
| 5 | bullet | Entertainment | 155 |
| 6 | bullet | Food | 4583 |
| 7 | bullet | Health | 138 |
| 8 | bullet | Housing | 875 |
| 9 | bullet | Manufacturing | 451 |
| 10 | bullet | Personal Use | 3134 |
| 11 | bullet | Retail | 4896 |
| 12 | bullet | Services | 5052 |
| 13 | bullet | Transportation | 406 |
| 14 | bullet | Wholesale | 56 |
| 15 | irregular | Agriculture | 45716 |
| 16 | irregular | Arts | 4605 |
| 17 | irregular | Clothing | 13447 |
| 18 | irregular | Construction | 1547 |
| 19 | irregular | Education | 4288 |
| 20 | irregular | Entertainment | 226 |
| 21 | irregular | Food | 74882 |
| 22 | irregular | Health | 1654 |
| 23 | irregular | Housing | 6285 |
| 24 | irregular | Manufacturing | 2436 |
| 25 | irregular | Personal Use | 3522 |
| 26 | irregular | Retail | 72486 |
| 27 | irregular | Services | 12433 |
| 28 | irregular | Transportation | 6360 |
| 29 | irregular | Wholesale | 164 |
| 30 | monthly | Agriculture | 87819 |
| 31 | monthly | Arts | 5864 |
| 32 | monthly | Clothing | 17936 |
| 33 | monthly | Construction | 4412 |
| 34 | monthly | Education | 23684 |
| 35 | monthly | Entertainment | 448 |
| 36 | monthly | Food | 55797 |
| 37 | monthly | Health | 7385 |
| 38 | monthly | Housing | 22995 |
| 39 | monthly | Manufacturing | 3317 |
| 40 | monthly | Personal Use | 18877 |
| 41 | monthly | Retail | 45680 |
| 42 | monthly | Services | 27206 |
| 43 | monthly | Transportation | 8614 |
| 44 | monthly | Wholesale | 414 |
| 45 | weekly | Agriculture | 39 |
| 46 | weekly | Arts | 15 |
| 47 | weekly | Clothing | 134 |
| 48 | weekly | Construction | 11 |
| 49 | weekly | Food | 97 |
| 50 | weekly | Health | 13 |
| 51 | weekly | Retail | 124 |
| 52 | weekly | Services | 153 |
| 53 | weekly | Transportation | 12 |
Note
For our three sectors of interest , they are have a lower bullet repayment rate of less than 4% as opposed to Agriculture with a bullet repayment rate of 23%.
Are there any other reason why Investors will be likely to invest in the above three sectors? Our second and third questions will be.
Question two: How long is the duration of loans in these sectors?
Question three: What activities make up the biggest part of these sectors?
Question Two- How long is the duration of loans in these sectors?
To note:
#Agriculture
df_ag = df.groupby('term_cat')['sector'].apply(lambda x: (x=='Agriculture').sum()).reset_index(name='count').sort_values(['count'],ascending=False)
df_ag
#For lack of time ,the following calculations were done on paper
# Short term loan 90%
#Medium term loan 9.6%
#long term 0.15%
| term_cat | count | |
|---|---|---|
| 0 | short_term_loan | 157336 |
| 1 | medium_term_loan | 16933 |
| 2 | long_term_loan | 277 |
| 3 | extra_long_term_loan | 78 |
#Clothing
df_ag = df.groupby('term_cat')['sector'].apply(lambda x: (x=='Clothing').sum()).reset_index(name='count').sort_values(['count'],ascending=False)
df_ag
#For lack of time ,the following calculations were done on paper
# Short term loan 94%
#Medium term loan 5.4%
#long term 0.06%
#extra long term 0
| term_cat | count | |
|---|---|---|
| 0 | short_term_loan | 30772 |
| 1 | medium_term_loan | 1760 |
| 2 | long_term_loan | 22 |
| 3 | extra_long_term_loan | 0 |
#Retail
df_ag = df.groupby('term_cat')['sector'].apply(lambda x: (x=='Retail').sum()).reset_index(name='count').sort_values(['count'],ascending=False)
df_ag
#For lack of time ,the following calculations were done on paper
# Short term loan 96%
#Medium term loan 3.2%
#long term 0.005%
#extra long term 0
| term_cat | count | |
|---|---|---|
| 0 | short_term_loan | 119212 |
| 1 | medium_term_loan | 3967 |
| 2 | long_term_loan | 7 |
| 3 | extra_long_term_loan | 0 |
Question 3: What activities make up the biggest part of these sectors?
#Clothing
df_ac = df_1.groupby('activity')['sector'].apply(lambda x: (x=='Clothing').sum()).reset_index(name='count').sort_values(['count'],ascending=False)
df_ac.head(5)
fig = px.pie(df_ac.head(5),
values='count',
names='activity',
title= "Fig 6.1: Activity distributed across the Clothing Sector"
)
fig.update_traces(textposition='inside')
fig.update_layout(
uniformtext_minsize=10, uniformtext_mode='hide')
fig.show()
#Retail
df_ar = df.groupby('activity')['sector'].apply(lambda x: (x=='Retail').sum()).reset_index(name='count').sort_values(['count'],ascending=False)
df_ar
fig = px.pie(df_ar.head(34),
values='count',
names='activity',
title= "fig 6.2 :Activity distributed across the Retail Sector"
)
fig.update_traces(textposition='inside')
fig.update_layout(
uniformtext_minsize=10, uniformtext_mode='hide')
fig.show()
#Food
df_af = df.groupby('activity')['sector'].apply(lambda x: (x=='Food').sum()).reset_index(name='count').sort_values(['count'],ascending=False)
df_af.head(20)
fig = px.pie(df_af.head(20),
values='count',
names='activity',
title= "fig 6.3: Activity distributed across the Food Sector"
)
fig.update_traces(textposition='inside')
fig.update_layout(
uniformtext_minsize=10, uniformtext_mode='hide')
fig.show()
From our scatter plot (Fig 5.0) there are few things which are by now not strange to us. We can clearly see the number of applications per sectors and the total amount funded, but what we can also see is the size of our balls. The size show us Investors' readiness to invest in a particular sector.
As we have seen above: Agriculture,Food and Retail have the highest loan applications and a corresponding high amount of funding. However the clothing ,Food and Retail sector are more attractive to investors than the agricultural sector.
Going into the details; an investor is willing to invest 78 USD per project in the clothing sector and 77 USD on projects in both the Retail and Food Sectors as opposed to 47 USD in the Agricultural sector.
The question here is why?
The repayment interval refers to how borrowers will pay back the loan granted to them. Will it be monthly,weekly,irregularly or in bullet form. Bullet refers to a form of payment where only the interest is paid back during the lifespan of the loan and theh principal will beb repaid as a lump sum at the end. Most investors do not like this type of investment especially if it comes with a low interest rate as is with th ecase with the principles of the KIVA organization: Loans to impevorished regions at low rates.
The clothing, retail and food sector have a bullet repayment rate of less than 4% as opposed to a 23% bullet repayment rate in the agricultural sector.
It is therefore of no wonder why investors will feel safer to invest in this sectors.
With every loan, there is always a risk that the lender will not be able to refund a loan. In this case the Kiva Organisation does not guarrantee a refund and always makes sure to let investor know about this risk.
In general we all know the risk associated with the agriculture business. It can be affected by natural disasters and diseases in the case of animal farming . In 2016 as we can see in fig 6.1 Clothing sales made up 68% of the total projects in the clothing sector. Therefore a huge part of investment in this sector was made up of clothes buy and resale. The world clohing business is one of the biggest and safest industry in the world,and naturally investors will feel safe investing.
In fig 6.2 General store make up about 50% of the retail sector. This is safe investment as the goods sold in a general store are mostly non perishable and investors are assured of their return on investments.
In fig 6.3 Food production and sales made up 20% of the projects in this sector and in general the activities in this sector are mostly for the sale of food. Food being a need and a basic commodity, there is a lower risk of business failure.
The duration of loan in these sectors were also analysed and from our discovery we can see that in general the Clothinga nd Retailsectors have relatively shorter duration than the Agricultural sector. Every investor would want to invest in a sector where the payback time is not long. In th eretail sector for example, short term loan make upü 96% of the total number of projects . it also has a very low rate of long term loans at 0.005%,and none of the projects in this sector last more than 10years. The same can be said with the clothing and food sector who follow closely behind.
However in the case of the Agricultural sector, 90% of its loans were short term ,9.6 % mid term,0.15% long term and 78 of it sprojects spanned more than 10 years.
We can see that in general the repayment duration in the retail,food and clothing sectors are relatively better off than that in the agricultural sectors.
Conclusion on EDA
To conclude, an investor investing in projects through the Kiva organization will be more willing to spend more than the bare minimum which is 25 USD in the Clothing, Retail and Food industries because of higher return rate and better repayment modalities.
On the other hand i am also aware that although this is a general view accross the board, it will be interesting to see how this dynamics change from one country to another. It is on this basis that my dashboard will be created .
Basierend auf den vorangegangenen Daten soll ein Dashboard erstellt werden.
Das Dashboard soll:
- mindestens 1 HTML component enthalten
- 1 Dash Core Component mit dem Daten ausgewählt werden können
- 1 Grafik, die sich in Abhängigkeit von der Datenauswahl ändert
Verwende folgenden Code zum Ausführen der App:
if name == 'main':
my_app.run_server(mode='inline', port=8091)
df_csi = df.groupby(by=['country','sector'],as_index=False).agg(applications_2016=('sector', 'count'), funded_2016=('funded_amount', 'sum'), inv_per_lender=('inv_per_lender','mean') )
df_csi
| country | sector | applications_2016 | funded_2016 | inv_per_lender | |
|---|---|---|---|---|---|
| 0 | Afghanistan | Arts | 2 | 14000.0 | 27.690000 |
| 1 | Albania | Agriculture | 719 | 976925.0 | 31.333060 |
| 2 | Albania | Arts | 5 | 8375.0 | 29.556000 |
| 3 | Albania | Clothing | 87 | 153925.0 | 31.759770 |
| 4 | Albania | Construction | 30 | 35325.0 | 31.784333 |
| ... | ... | ... | ... | ... | ... |
| 985 | Zimbabwe | Personal Use | 1 | 1400.0 | 53.850000 |
| 986 | Zimbabwe | Retail | 765 | 764500.0 | 39.301399 |
| 987 | Zimbabwe | Services | 230 | 209875.0 | 39.358391 |
| 988 | Zimbabwe | Transportation | 9 | 11550.0 | 29.873333 |
| 989 | Zimbabwe | Wholesale | 16 | 22300.0 | 33.176875 |
990 rows × 5 columns
Note
Before we go to our dropdown let us see a view of our data in Kenya.
import plotly.express as px
fig = px.scatter(data_frame=df_csi.loc[df_csi.country=="Kenya",:],
x="applications_2016",
y="funded_2016",
color='sector',
size= "inv_per_lender",
size_max= 30,
title= "Fig 5.1 : Propensity to invest per sector in Kenya 2016",
template="plotly_white",
labels={"funded_2016":"total invested in 2016", 'applications_2016':"total projects in 2016"}
)
fig.show()
Note
From the above plot we can see that in 2016 in Kenya, Agriculture was theh sector with the highest number of applied projects (32,000) and a total funding of 16,2 million USD with an investor paying approximately 38 USD per project.
The food sector comes far behind with a total number of projects at 13,000 ,total funding at 4.2 million, meanwhile investors were willing to invest 43 USD per project.
from jupyter_dash import JupyterDash
from dash import dcc
from dash import html
from dash.dependencies import Input, Output
# Dashboard Creation
external_stylesheet = ['https://codepen.io/chriddyp/pen/bWLwgP.css']
my_kiva_app = JupyterDash(__name__, external_stylesheets=external_stylesheet)
my_kiva_app.layout = html.Div([
html.H1("Kiva Country,Sector and Investor Analysis 2016",
style={"text-align":"center"}
),
html.Div([
html.Label("Choose country:",
style={'font-weight': 'bold'}),
html.Br(),
dcc.Dropdown(id="dd_kiva_plot",
options=df_csi.country.unique(),
style={ 'width': '250px'},
)
]),
dcc.Graph(id="g_kiva_scatter",
figure={},
className="ten columns"
),
dcc.Graph(id="g_kiva_bar",
figure={},
className="ten columns"
)
])
@my_kiva_app.callback(
Output("g_kiva_scatter","figure"),
Output("g_kiva_bar","figure"),
Input("dd_kiva_plot","value")
)
def update_kiva_figure(selected_country):
df_sel= df_csi.loc[df_csi["country"]==selected_country,:]
df_sel2=df_csi.loc[df_csi["country"]==selected_country,:]
g_kiva_scatter=px.scatter(data_frame=df_sel,
x="applications_2016",
y="funded_2016",
color='sector',
size= "inv_per_lender",
size_max= 30,
title="Investor behavior towards project vis a vis sector and country",
template="plotly_dark",
labels={"funded_2016":"total invested in 2016", 'applications_2016':"total projects in 2016"}
)
g_kiva_bar=px.bar(data_frame=df_sel,
x="sector",
y="applications_2016",
color='sector',
title="Applications per country",
template= "plotly_dark",
labels={"applications_2016":"Total projects 2016"}
)
return g_kiva_scatter,g_kiva_bar
#if __name__=="__main__":
# my_kiva_app.run_server(mode="external", port=8091)
if __name__=="__main__":
my_kiva_app.run_server(mode='inline', port=8103)
Note
The above dashboard has two graphs. One of them shows us Investors' propensity to invest per sector when a country is chosen. It also shows us the total amount of money that was invested in a given country and in a given sector in 2016.
and the second gives a clearer vision of how many applications KIVA received per countries .
This Dashboard gives a somewhat overall view of the business in 2016.
#Business Figures
#Number of Contributions
df_1.lender_count.sum()
13655103
#Number of Applications
len(df_1.index)
646833
More than 1.7 billion people around the world are unbanked and can’t access the financial services they need. Kiva is an international nonprofit, founded in 2005 in San Francisco, with a mission to expand financial access to help underserved communities thrive.
The goal of Kiva organisation is in their own words "We envision a financially inclusive world where all people hold the power to improve their lives."
They do this by crowdfunding loans and unlocking capital for the underserved, improving the quality and cost of financial services, and addressing the underlying barriers to financial access around the world.
In 2016 Kiva recived 646,833 applications from accros the globe with the most applications coming from :Asia,South and Lating America and Africa.They were able to finance these projects with the help 13,655,103 contributions at a funding rate of 92%.
Having now arrived at the end of our Analysis we can conclude the following:
All in all the KIVA is a very good platform for those who are seeking to invest and those who are in need of funds ,and this is proof that there are still good people in the world.
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)
Important Notice
This Data set was gotten from teh KIVA website.